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Under constant applied force, the separation of double-stranded DNA into two single strands is 
known to proceed through a series of pauses and jumps. Given experimental traces of constant- 
force unzipping, wc present a method whereby the locations of pause points can be extracted in the 
form of a pause point spectrum. A simple theoretical model of DNA constant-force unzipping is 
^ ■ demonstrated to produce good agreement with the experimental pause point spectrum of lambda 

r^ ' phage DNA. The locations of peaks in the experimental and theoretical pause point spectra are 

found to be nearly coincident below 6000 bp. The model only requires the sequence, temperature 
and a set of empirical base pair binding and stacking energy parameters, and the good agreement 
with experiment suggests that pause points are primarily determined by the DNA sequence. The 
model is also used to predict pause point spectra for the Bacteriophage PhiX174 genome. The 
algorithm for extracting the pause point spectrum might also be useful for studying related systems 
which exhibit pausing behavior such as molecular motors. 



I. INTRODUCTION 
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The unbinding of double-stranded DNA (dsDNA) into single-stranded DNA (ssDNA) is a ubiquitous event cen- 
I , tral to many cellular proce sses. Much research has focused on understanding the thermal unbinding of dsDNA 
' ^ ■ l|Wartell and Benight) . Il985|) . These studies have revealed quantitative aspects of the thermal unbinding tran- 
sition thro ugh the extraction of sequence-dependent free energy differen ces betwee n bound and unbound DNA 
(|Blossev ^.nd Carlon, 2Q03; Rouzina and Bloomfield. , i!l999aiibi: iSantaLucia et. ^ ij. I199(t() . In living cells, however, the 
unbinding of dsDNA is typically achieved using molecular motors which utilize chemical energy and exert forces to pull 
apart the strands of dsDNA. To have a quantitative understanding of these processes it is important to first study the 
K^ simpler case of unbinding of dsDNA by a constant external force. This process is typically referred to as 'unzipping' 
if^ , of d sDNA. For early experime nts which unzip lambda phage DNA with a constant velocity and a fluctuating force, 
■^ ' see (|Bockelmann et. alll2002j) . 
(NJ ! Recently, single-molecule experiments ha ve all owed study of this process (see Fig. [T]for a schematic illustration of the 



\0 ■ experiment ). Both theory i Bhattacharicc. "2000": "Coc co et. all '2001. '2002 HKafrietr al.'. "2002": "Lubcnskv and NelsonL 

O ; I2OOO. 2003 iMarenduzzo et. al, 2002: Nelson. 2003: Sebastianl [2000) and experiments (,Danilowicz et al.. 2003b^) show 

^ . that at a given temperature the dsDNA separates into ssDNA when the applied forc e exceeds a critical value Fc- 

^^ ' Moreover, for forces near F^, the dynamics of the unzipping process is highly irregular IjDanilowicz et aD . l2003a|) . as 

displayed in the time evolution of the junction between the separated ssDNA and the bound dsDNA. This junction is 

referred to as the unzipping fork. Rather than a smooth time evolution, the position of the unzipping fork progresses 

through a series of long pauses separated by rapid bursts of unzipping. 

i-rt ' Pauses and jumps in constant-force unzipping can have several origins. For temperatures near the dsDNA melting 

P5 ] transition, portions of the dsDNA can unbind and form transient 'bubbles' below the unzipping fork. In addition, 

O ■ because of the helical nature of the dsDNA structure, a natural twist can be accumulated during the force-induced 

J-; ' unzipping process. If the unzipping fork encounters a thermal bubble in the course of its progress, then we would expect 

^ , a jmirp in its position. Furthermore, if the unzipping occurs on time scales much faster than the time scales associated 

J^ ' with untwisting, then one would expect pauses when the DNA has to unravel accumulated twist IJThomen et. al.l 

rS I |2002|) . Moreover, since AT and CG base pairs (bp) have different interaction strengths (and associated base pair 

j^ ' stacking energies - see Table ^), pauses and jumps could also be due to effects associated with the particular sequence 

■ - - • of the DNA. 

Experiments on multiple ide ntical copies of th e same DNA have shown that locations of pauses are highly conserved 
from one strand to another, IJDanilowicz et all l2003a|) . Hence, it seems likely that in these experiments at least, 
transient bubbles and accumulated twist play only a minor role in determining the jumps and pauses. In this work we 
study the location of the pause points both experimentally and theoretically. To facilitate this study, we introduce a 
pause point spectrum which is a function of the number of base pairs unzipped. The locations of peaks in the pause 
point spectrum signify the location of pause points in unzipping, and peak areas can be used as a measure of the 
strength of pause points. We predict pause point locations by adapti ng a model of the dynamics of the unzipping 
fork in a constant-force unzipping experiment on heterogeneous DNA (|Lubenskv and Nelsonl . |2002() . The only input 
information into the analysis is the DNA sequence, free-energy differences between dsDNA and ssDNA obtained using 




FIG. 1 Schematic diagram of the DNA constant-force unzipping experiment. One strand of the dsDNA is attached to a fixed 
support, typically via a linker DNA strand (not shown). The other strand is pulled with a constant force, F, via a magnetic 
bead (not to scale) attached to the strand. If the force is large enough, the dsDNA separates into two ssDNA strands. The 
position of this separation, measured in base pairs opened m{t), locates the unzipping fork. See Figure |5| for a more detailed 
description of the experimental setup used in this paper. 

melting experiments, and temperature. Both thermal bubbles and build up of twist are ignored within our treatment. 
We find that we can predict most experimentally observed pause points, thus confirming that pause point locations 
are primarily a function of sequence. Our algorithm might also prove u seful for analyzing pause points arising in other 
single molecule experhnents, such as RNA polymerase and exo nuclease ()Davenport et. al.] . l2000tlNeuman et. al.ll2003i 
iPerkins et. 'aD.l200.1tlWang et. al.Lll998D . 

The paper is organized as follows: In section^ we describe constant-force unzipping experiments performed on 
lambda phage DNA. In section lTTll we present an algorithm for constructing a pause point spectrum from experimen- 
tal traces of unzipping fork position versus time. Section FlV. Al describes a theoretical model of DNA constant-force 
unzipping which defines a free energy landscape as a function of the number of bases unzipped, m, used to describe the 
unzipping process. In section FlV.BI this free energy landscape is used as a surface on which to perform Monte Carlo 
simulations mimicking the unzipping experiments. In the same way as performed for experimental unzipping trajec- 
tories, these trajectories are combined to form theoretical pause point spectra, which are compared with experiment 
in section Ivl 



II. EXPERIMENTAL METHOD 



The experimental procedure has been discussed previously in (|Assi et. all l2002t iDanilowicz et al.l l2003aj) . As 
shown in Figure 12 our setup consisted of two pieces of lambda phage DNA, covalently bound to each other. One 
strand of DNA was used as a spacer between the glass capillary and the other strand, which was to be unzipped. The 
spacer strand of DNA was attached to the capillary with a digoxigenin/anti-digoxigenin antibody bond. The capillary 
was coated with digoxigenin antibody while the spacer strand of DNA was hybridized with a digoxigenin labeled 
oligonucleotide. One end of the strand of DNA to be unzipped was hybridized and ligated with a hairpin to prevent 
the complete separation of the unzipped DNA molecule. The other end of the strand was hybridized and ligated with 
a biotinylated oligonucleotide which specifically bound to a streptavidin-coated super-paramagnetic bead. When a 
magnetic field was applied, the force induced on the bead slowly unzipped the DNA. The unzipping direction (order 
of nucleotides unzipped) was controlled by the selection of oligonucleotides. 

Figure[2l3 shows the round, antibody coated capillary inside an uncoated square microcell. The round capillary was 
0.5 mm in diameter, while the square microcell was 0.8 mm across, leaving a space for a solution of DNA, beads and 
buffer inside the microcell but outside the sealed, empty, round capillary. The capillary was incubated in a solution 
containing digoxigenin antibody at 5°C for at least two days. The DNA solution and bead suspension were also 
individually kept at 5°C prior to the experiment. We inserted a digoxigenin antibody coated capillary and the DNA 
and bead suspension into the microcell and then incubated it at 37°C for 45 minutes, allowing the DNA to bind to 
the capillary via the antigen-antibody bond. Finally, we rotated the microcell so the beads that had settled on top 
of the round capillary were hanging off of its side. We focused using a microscope objective on the beads that were 
attached to the outermost point on the capillary so that we could accurately measure the distance between the beads 
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FIG. 2 Molecular construction and square cell. (A) Schematic of the DNA binding to the inner glass capillary and the magnetic 
bead such that pulling the bead away from the surface will cause the dsDNA shown on the right side of the diagram to be 
separated into two single DNA strands. Note that the figure is not to scale, considering that lambda DNA contains 48,502 bp. 
(B) Schematic of the side view of the square capillary containing the round glass capillary to which the DNA molecules are 
bound. The magnetic tweezer apparatus exerts the controlled force on the magnetic beads, a microscope is used for observation, 
and two thermoelectric coolers are used to control the temperature of the sample during the initial incubation. The magnetic 
beads are pulled to the right in a direction parallel to the bottom and top surfaces of the square capillary, and perpendicular to 
the surface of the round capillary at a height equal to the radius of the round capillary, where we focus the microscope. This 
design allows us to view DNA molecules that are offset from the surfaces of the square capillary, and to infer the number of 
separated base pairs (bp) by measuring the separation between the magnetic bead and the surface of the round capillary. 



in our field of view and the capillary. 

We applied a magnetic force by bringing a stack of small magnets mounted on an xyz translation stage near the 
microcell. The magnets could be approximated as a soleno id with its lo ng axis in the z-direction, so the field gradient 
acted in the z-direction only and was essentially uniform (JAssi et. al.l |2002) over our field of view, which was much 
smaller than the solenoid radius. 

We measured the distance the DNA molecules had unzipped by tracking their attached beads. We took still digital 
photographs of the field of view through a lOx objective lens once every 10 seconds. An image processing program 
found the coordinates of each bead in each frame. Figure |21 shows part of our field of view at two different times. 
In Figure |3fa), we had just applied the magnetic field. Figure |2Ib) shows the same beads 28 frames, or just over 3 
minutes later. 

In each experiment, we saw approximately 50 beads in our field of view. Approximately 10 beads unzipped slowly 
over the course of the experiment, pausing at various points. An individual bead paused at fixed extension until, 
through thermal fluctuations, the unzipping proceeded. After overcoming an energy barrier which we attribute to 
sequence heterogeneity, the strand unzipped up to the next pause point, where the same process repeated. Pause 
points seemed very reproducible in experiment from bead to bead (each attached to a genetically identical DNA), 
even when force and temperature varied considerably. This statement will be quantified below. 

In order to compare simulation results to experimental data, we converted microns unzipped to the numbers of base 
pairs unzipped. The centers of beads attached to fully zipped strands of lambda phage DNA under a force near 15 
pN were observed 16.5/i?n from the round capillary in experiments. The centers of beads attached to fully unzipped 
strands of lambda phage DNA under a force of 15 pN were observed 77Afim from the round capillary in experiments. 
Thus DNA strands being unzipped were stretched to a length of 60.9iJ.m. Lambda phage DNA is 48, 502 base pairs 
long, so to convert from fim to base pairs, we use the conversion factor 48, 502bp/60.9/i7Tj sa SOOhp/ fim. Since two 
strands of ssDNA are produced during unzipping, the monomer spacing along a ssDNA strand is found from the 
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FIG. 3 Photographs from above the square capillary shown in Figure|5Jb). The dark bar on the left is the inner round capillary 
to which the beads are tethered. The dark dots are the beads, (a) shows the beads immediately after a 15 pN force was 
introduced. Most of the beads are in the fully zipped position, approximately 17 ^.m from the surface of the capillary, (b) 
shows the beads three minutes later. More DNA strands have unzipped, causing the beads to jump farther from the round 
capillary. 

inverse of this factor divided by two to be a w Q.Qnm. Such a linear interpolation seems reasonable given the fairly 
large forces (~ 15 — 2QpN) acting on the unzipped 'handles'. 

III. PAUSE POINT ALGORITHM 

Figure [SIb-c) displays several experimental unzipping trajectories. As can be seen in the trajectories, the unzip- 
ping fork progresses through long pauses at specific locations, separated by rapid transitions between these pauses. 
Moreover, a sample of trajectories from identical DNA sequences display a uniformity in the locations at which the 
DNA unzipping pauses. Note also that pauses at certain locations seem consistently longer than others. From these 
considerations, we are motivated to develop a method for combining many trajectories to form a distribution reflecting 
the location and relative strengths of pause points. 

A pause point 'spectrum' can be computed as follows (see Figure 0| for an example): 

1. Create a histogram (area normalized to 1) based on the position of the unzipping fork during the time duration 
of the experiment for all trajectories using the highest resolution possible. 

2. In order to smooth this histogram according to the real experimental resolution, define a window centered 
around each position of the histogram, with a width equal to the experimental resolution. 

3. Compute the average histogram peak height within this window, and assign the value of this average to the 
position of the center of the window in the pause point spectrum. 

The location of the peaks in a pause point spectrum correspond to the distances at which the trajectories paused, 
and thus correspond to pause points. In addition, the peak area is proportional to the amount of time trajectories 
spent at the peak locations. Hence, relative peak area can be used as a measure of the relative strength of pause 
points. As can be seen from direct comparison between the spectrum and trajectories (Figure [SJ, this method of 
analysis provides an excellent summary of pauses and jumps observed in experiment. From this intuition, we expect 
that higher experimental spatial resolution would allow us to resolve further peaks in the spectrum. 

The experimental pause point spectrum was computed using 15 unzipping trajectories of which 10 are shown in 
Figure EJb-c). In addition to applying the above algorithm, the following steps were taken: Trajectories were first 
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FIG. 4 Sample window averaged results. The high resolution (1 bp) pause time histogram is shown in black. The grey spectrum 
is created by sliding a window of size 400 bp along the x-axis, and assigning a y-value to the midpoint of the window equal 
to the average of the high resolution histogram within that window. Note that highly localized pause points appear as broad 
peaks according to a much lower resolution in the window average. 
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FIG. 5 (a) Experimental pause point spectrum alongside sample experimental unzipping trajectories at T = 298-?^ (25°C) 
with (b) F = 15pN and (c) F = 20pN. The experimental spatial resolution is O.Sfim (about 400 bp), which determined the 
size of the window used to calculate the spectrum. The structure of some peaks indicate multiple underlying pause points that 
are only partially resolved at this resolution. The full experimental pause point spectrum was computed with 15 experimental 
trajectories, only 10 of which are shown. 

individually shifted so that the starting position of each trajectory was zero ^m. This was done in order to subtract 
the linker length from the measured distance for each bead. In order to collect as many trajectories as possible for 
better statistics, the experimental resolution was 0.5nm. The spectrum was thus created with a window of 0.5/im, 
corresponding to about 400 bp. In each of the experimental trajectories, there was a period in the beginning that 
was noisy due to transient bead adjustments in the turning on of the magnetic field (Figure [SJ. These regions were 
not included in the experimental spectra. To convert from fim to bp, we multiplied by (48502 bp/ 60.9 i^m), which 
represents the appropriate hp/ iim factor for lambda phage under these experimental conditions (Section Hljl . 



IV. THEORETICAL STUDY OF PAUSE POINTS 

A. Defining the Free Energy Landscape 

We consider the DNA unzipping experiment (Figure QJ as a chemical reaction from dsDNA -^ 2 ssDNA. This 
system possesses a natural one-dimensional reaction coordinate, namely the number of base pairs (bp) unzipped to, or 
equivalently the spatial location of the unzipping fork. Theoretical descriptions can then be naturally reduced from a 
complicated, three-dimensional system to a one-dimensional description with some of the interactions renormalized. 



expressing the three-dimensional nature of the pr oblem. ^_^ ^_^ 

A very simple one-dimensional effective model IjLubenskv and Nelsonl l2000l |2002() can then be written down based 
on the above picture. We define a free energy as a function of the number of bases unzipped, £{m), which represents 
the difference in free energy between the states with m base pairs unzipped and the fully zipped states {£{111 = 0) = 0). 
There is a contribution to £{m) — £{m — 1) from the free energy difference between the bound and unbound ttj*'* base 
pair, AGbp = kBTrj{m), as well as a contribution due to adding two additional monomers to the free ssDNA strands 
under a tension, F, denoted as 2kBTg{F). We can write these contributions as 

£(m) — £(m—l) „ ,^, ~, ^ ,,^ 

I. rr = 23{F) + r?(m). (1) 

The DNA sequence information is stored in the function rj{m). In principle, this function might depend on time due to 
transient bubbles or twists in the DNA. At f orces low enough (F < bpN) s uch that hairpin formation in the unzipped 
handles is possible IJDessinees et. al.l l2002t FMontanari and MezardL l200l|) . g{F) could also be sequence dependent. 
These compilations are neglected here. If we iterate equation (^ until we reach ?7i = 0, we have 



£{m) 
knT 



= 2g{F)m + Y,v{n), (2) 



Here we are setting rj{Q) = 0, which ensures £{0) = 0. The long DNA sequences we are considering should be 
insensitive to such edge effects. 

Thcrmodynamically we expect in equilibrium that unzipping of a dsDNA molecule of M bases will occur when 
£{M) < £{0), with the transition region between ds and ss DNA occurring when £{M) = £{0). If we define the 
shifted function //(n) = TKn) — rj, where rj is the average of r]{n) over the sequence, we can rewrite ^ in the form 



g(m) 
knT 



fm + J2 Vin) (3) 



n=0 



f = 2g{F) + rj (4) 

The parameter / is a reduced- force, which defines the overall tilt of the free energy landscape (FEL), with the 
particular base sequence overlaid on this tilt with the function X]n=o '?("')■ With this definition, the critical reduced 
force is defined by the equation / = 0. Values of / > represent forces too low to unzip, while values / < represent 
forces where full equilibrium unzipping is thermodynamically favorable. In all of the above, we have assumed that 
thermal bubbles do not form under the experimental unzipping conditions. This simple model can be defined in a 
more rigorous fashion by integrating out three-dimensional degrees of freedom in a statistical mechanical microscopic 
definition of the system. Effects due to b ubbles can be incorporated into a coarse grained model, with renormalizcd 
parameters (|Lubenskv and Nelsonl . |2002|) . 

It is interesting to note that even for this simple model, non-trivial phenomena can occur due to the buildup of 
free energies naturally present in ^. Indeed, for the case of a completely random base sequence of length M, a sum 
over the independent random variables in |J2J) would give an energy barrier ~ ksTyM (|Lubenskv and Nelsonl [2002ft . 
Since GC base pairs are ^ fc^T stronger than AT pairs at room temperature (Table P), we expect large peaks to 
appear due to the presence of long GC-rich regions. For a sequence of length A/ — 48, 000, one expects barriers of 
the order of 200 fc^T. 

In practice, FEL's are co mputed for a particular ge nome sequence using the experimentally determined free energies 
of base quartet formation jSantaLucia et. all . Il996() . There are 10 distinct base quartets, where m now represents 
the TO**^ base quartet, while rj{m) represents this base quartet's free energy (Table HJ. These free energy parameters 
are determined through thermal denaturation studies on short dsDNA fragments, and were found for temperatures of 
around 310 K. By using base quartet free energies, base stacking interactions are include d, which are thought to be 
more important for overall dsD NA stability than the hydrogen bonds in between base pairs IJBlossev and CarlonLl2003i 
iGrosberg and Khokhlovl Il994l) . To compute FEL's for different temperatures, the free energies for a given quartet 
were calculated from kBTrj{m) = AGqt = AHgt — TASqt- Here AH^t (ASqt) is the enthalpy (entropy) difference 
between the bound and unbound DNA base quartet. Once temperature and / are specified, the FEL is computed 
with equation (^ or Q. 

The case of lambda phage DNA is particularly interesting since it is kno wn that this genom e consi sts of a GC-rich 
half connected to an AT- rich half ^. Using the free energy parameters of (|SantaLucia et. aP . ll99(Tft . one finds that 



^ The lambda phage genome can be found at|http://www.ncbi.nih.gov/|with sequence accession number NC_001416. 



Base Quartet AGqi/ksT 



5'-GC-3' 


4.46 


CG 


4.22 


GG 


3.46 


GA 


2.79 


GT 


2.96 


CA 


2.79 


CT 


2.20 


AA 


2.31 


AT 


1.52 


TA 


1.33 



TABLE I Base quarte t free energies AGqt for the bound to unbound transition for T = 298 A' (25° C) taken from 
JSantaLucia et. all Il996t) . using AGqt = AHqt — TASqt- Only two nucleotides of the base quartet are shown, the other 
two obtained from the usual complementarity A-T and G-C. Free energies are expressed in units of ksT = 0.59kCal/mol at 
T = 298K (25°C). 
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FIG. 6 Free energy landscape for the lambda phage genome at / = — 0.15,r — 298K (25°C) corresponding to F ~ 16pN. A 
closer view is given in the inset. Because the lambda phage genome splits into a GC-rich 'front end', followed by an AT-rich 
region, the effective critical unzipping force is the one shown here, which produces an approximately flat energy landscape for 
the first ~ 20, 000 base pairs. Note that there are still energy barriers ~ 20fcsr in this region due to sequence heterogeneity. 

the large GC-rich region creates a peak of approximately SOOOfc^T at / = and T — 298K (25°C), representing an 
insurmountable barrier to unzipping, and which is much larger than that expected for a random sequence. For the 
lambda phage genome, we thus define an operational critical reduced force of unzipping as the value of / such that 
£{Q) = £{mQc), where mcc is the boundary of the GC-rich region. As can be seen in Figure El this operational 
critical reduced force corresponds to approximately / = —0.15. Forces greater than this should allow easier unzipping 
since the AT-rich portion of the FEL has a negative slope for these forces. However, even at these large forces, there 
are barriers on the order of 20kBT to unzip (Figure inset.) Using a Freely- Jointed Chain (FJC) model for the 
single strands, with mon omer spacing, a = 0.6nm (cor r espon ding to the hp/^m conversion factor in section^, and 
Kuhn length, b ~ 1.9nm IjSmith. Cui and Bustamantg . ll996j) . a reduced force value of / = —0.15 corresponds to an 
experimental force value of F « 16pN. 



B. Dynamics 



To study the location of the pause points one has to study the dynamics related to moving along the chemical 
reaction coordinate, m. Macroscopic unzipping occurs only if / < 0, when the equilibrium state of the system is 
unbound. In this case, the experimental traces of DNA unzipping represent the approach of the system toward its 
equilibrium single-stranded state. 



As outlined in IJLubenskv and NelsonLl2002(l . there are four dynamical time scales associated with DNA constant- 
force unzipping in the setup shown in Figure ^ '''ond and Tbuik represent base pairing and unpairing at the end of the 
strand and in the bulk, respectively; Tssim) represents the relaxation time of the liberated single strands; and Trotifn) 
represents the relaxation time of twist built up in the zipped portion of the strand due to the helical nature of the 
DNA. The latter two time scales vary as a function of m. The dynamics of unzipping are determined by the slowest 
of these time scales. Here we assume that this time scale, for any value of m, is related to the unbinding of base pairs. 

In the analysis below, we assume that the slowest timescale is 7Ti-independcnt. Furthermore, it can be argued 
that bubble formation is suppressed in DNA for the relevant experimental conditions because of strong base stacking 
interactions (JBlossev and CarlonLl2003(l . 

We will be interested in the unzipping dynamics for / < 0, that is for forces above the critical force of unzipping where 
it is thermodynamically favorable to unzip. However, even under these conditions, the approach to thermodynamic 
equilibrium is far from simple. Smooth progress of the unzipping fork is hampered by very large energy barriers in 
£{m) that can be caused by the buildup of positive r]{m). As mentioned above, for random DNA sequences of length 
M, these barriers can grow as \/M. Forces slightly above the critical force are unable to remove these barriers through 
tilting the landscape, and we expect to observe difficulty in traversing these barriers. Since the barriers are sequence 
dependent, it is possible that the dynamics of unzipping display characteristic signatures of the sequence. 

To study the behavior of a particular DNA sequence, and to make direct contact with experiments, it is useful to 
have a dynamical model that closely mimics the experiment. This can be achieved most simply through Monte Carlo 
(MC) simulations of a random walker on the one-dimensional FEL for the specific DNA sequence under study, at the 
specified reduced force and temperature conditions. The position of the walker on the FEL represents the position of 
the unzipping fork in experiments. The walker moves from position tti to a nearest neighbor position tti ± 1 with the 
rate 

w[m -^ (mil)] = min{l,e-[^(™=^i)-^("')l/''«^} . (5) 

Details of the algorithm are outlined in Appendix 1X1 

A simulation consists of specifying the FEL (DNA sequence, temperature and reduced force), and propagating 
the MC algorithm for a specified number of steps. The initial condition is such that the walker starts at m = 0, 
representing the experimental circumstance of tracking DNA's that begin as fully zipped. What results is trajectory 
data, m(t), which contains the same information obtained in experiments. Sample theoretical trajectories are shown 
in Figure \7\ There are clear pauses and jumps of the trajectories for reduced forces much higher than the critical 
force (—0.39 < / < —0.5). Only extremely large reduced forces (/ = —5) are sufficient to remove all barriers and 
allow smooth unzipping. 

It must be stressed that there is a certain freedom in choosing particular MC algorithms (Appendix^. Algorithms 
can differ in how long it takes random walkers to traverse energy barriers, and thus we do not expect to be able to 
compare timescale information between experiment and theory, quantitatively. Although pause point locations can 
be predicted, we do not expect pause point strengths to match between experiment and theory. 

To calculate the theoretical pause point spectra, simulations were performed at a variety of reduced forces, /, 
and on FEL's at the same temperature of the experiments. Each simulated trajectory consisted of 10^ Monte Carlo 
steps, and 300 trajectories were used to create the theoretical histograms. Corresponding to the 0.5/im experimental 
resolution, the window averages were taken over 400 base pairs with a length per base pair for lambda phage DNA 
under these conditions of 60.9/xto/48502 bp as discussed in Section^] For / values in which some of the trajectories 
reached the fully unzipped state (m ~ 48,502 bp), the trajectories were cutoff past 48,000 base pairs before being 
included in the spectra. 

V. DISCUSSION 

An examination of a few experimental unzipping trajectories (FigurelSJ reveals that the pausing locations are often 
encountered by multiple copies of the same DNA. These copies are subject to different realizations of thermal noise, 
but share the same base sequence, an indication that sequence is a strong factor in governing pause point locations. 

Experimental and theoretical pause point locations can be compared by examining the peak positions in the 
corresponding pause point spectra (Table ^J- There is strong agreement between experimental and theoretical pause 
point locations at distances less than 6000 bp. In addition, theory predicts a gap in the pause point spectra of ^ 9000 
bp starting at 5400 bp, which is similar in size and location to that observed in experiment. The fact that the 
positions of the pause points and gaps match to such a high degree between experiment and theory are evidence that 
for these experimental conditions, the approximations inherent in the concept of dynamics on the FEL representing 
DNA constant-force unzipping as a model for the experiments are sound. Since the theoretical model only requires 
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FIG. 7 Sample theoretical unzipping trajectories, m{t), where r = 10000 steps, (a) / = —0.26, (b)/ — —0.39, (c) / — —0.50, 
(d) / = —5. For t/r > 15, (a) shows intricate two-state behavior caused by nearly degenerate minima on the FEL. In (d) the 
unzipping is smooth, but does not fully unzip in 80, 000 time steps indicating dwell time at some sites. Note that pause points 
are reproducible in the simulations, and that very large forces are required to smooth out the large barriers present in lambda 
phage DNA and allow smooth unzipping. 



Experiment (±200 bp) Theory (±200 bp) 



1000 
2400 
3400 
4700 
5600 
6100 
7100 
Gap = 6400 
13500 

14700 
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4600 
5400 
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14200 



TABLE II Experimental vs. theoretical pause point locations (bp) for the first 15000 bp corresponding to unzipping the 
front-half of the lambda phage genome (Figure |SJ . The pause point locations are the positions of the centers of the peaks 
in the pause point spectra (Figure |HJ, and have errorbars of ± 200 bp due to experimental resolution. The theoretical pause 
points include those found for / = —0.29, —0.39, —0.47, —0.50. Also listed are the size of the gap regions in the spectra where 
no pause points are found. Note that every theoretical peak less than 6000 bp is within the errorbars of an experimental peak, 
and the theoretical and experimental gaps are roughly the same size and in the same location in the pause point spectra. 
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FIG. 8 Experiment (black) and theoretical (grey) pause point spectra for lambda phage at T = 298K (25°C). Section IlIII 
outlines how the spectra were created using a sliding window average based on a 0.5/im experimental resolution. Experiments 
were done with F = 15pN and F = 20 pN, and the spectrum was obtained from 15 experimental traces (see Figure |^. 
Theoretical spectra are at (a) / — —0.29, (b) / = —0.39, (c) / = —0.47, (d) / — —0.50, and where created using 300 traces of 
10^ steps each. 



the base sequence and therniodynamic parameters for base quartet formation, the agreement is proof that pause point 
locations are strongly governed by base sequence. The good compariso n also shows that neglect o f bubble formation 
for these temperatures is appropriate, as is also found by other means IJBlossev and Carlonl 1200^ . 

The pause point spectra contain much more information than the pause point locations. Theoretical and ex- 
perimental pause point spectra are shown in Figure |S1 The values of / used in the simulations lie in the range 
—0.25 < / < —0.5. Recall that / = —0.15 corresponds roughly to a flat average FEL in the GC-rich region of the 
lambda phage DNA (Figure |SJ). A value of F w 17pN corresponds to / = —0.37 under these conditions, which is 
within this range. We have compared experimental and theoretical pause points in the front half of the unzipping 
process. The much steeper energy landscape in the back half (see Figure |S} eliminates most pause points. As the 
values of / are gradually decreased from / = —0.25, the theoretical pause point spectra grow into more peaks at 
larger distances, although low base pair peaks are still preserved. Thus the locations of pause points are fairly robust 
with respect to / values. Once the forces are high enough to allow exploration of the whole FEL, the location of the 
peaks in the spectra do not change, and peak areas are adjusted reflecting a changing of the strength of the pause 
points. 

There are some noticeable disagreements in pause point location between experiment and theory. In particular, 
the experimental doublet peak at 7100 bp was not observed in any theoretical spectra for a variety of parameters, 
including longer simulation times. The pair of peaks centered around 14000 bp in the experimental spectra are also not 
picked up in the theoretical spectra, rather a single peak lying in the middle of the experimental peaks is found. This 
most likely does not represent an averaging of the two pause point locations in the theoretical spectrum because we 
would expect a broad peak covering the two locations in this scenario. The doublet of experimental peaks represents 
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FIG. 9 Experimental (black) and theoretical (grey) pause point spectra for lambda phage simulations of 10* steps at 298K 
(25°C). (a) / = —0.39, (b) / = —0.50. The longer run times in the simulations produce different pause point strengths, but 
similar pause point locations to Figure |H| 



data from two separate runs, and thus a slight miscalibration in the SOQbp/ iim conversion factor (Section^ for those 
particular runs could cause the two peaks to separate, since a miscalibration has a larger effect for longer distance 
pause points. For strong enough forces, theoretical pause point spectra also display many more peaks than present in 
the experimental spectra. It could be that more experimental trajectories need to be included to observe these peaks. 

While the peak positions in the experimental and theoretical pause point spectra coincide quite well, the peak 
areas noticeably differ. One source of this discrepancy is due to the time scales of the DNA unzipping. Each step 
in the Monte Carlo propagation of the unzipping can be thought to occur on t he microscopic time scale governing 
the DNA unzipping, which is estimated to be ~ 10~^s l)Danilowicz et all l2003a|) . with a large error in the exponent 
(JMathe et. al.l . l2004|) . The accuracy of this figure is not high enough to allow direct comparison with theoretical 
and experimental time scales. As mentioned above, the particular choice of Monte Carlo algorithm can change 
the characteristic unzipping times of simulations and could account for the discrepancy between theoretical and 
experimental time scales. 

Figure |51 shows pause point spectra obtained for simulations of 10* steps. A reduced force of / = —0.39 is not strong 
enough to allow DNA's to unzip under this length of time. Comparing to the 10^ step simulations (Figure IS)), we see 
that longer times in this case allow peaks at slightly higher base pair to be observed, but mainly result in a change 
in peak area. For / = —0.50, which allows for some fraction of unzipping even with 10^ step runs, longer runs only 
serve to change peak areas (compare with Figure ISJ. Longer simulation times do not give spectra that approach the 
experimental pause point spectrum. We thus expect that even longer times will not provide quantitative agreement 
of pause point strength with the current MC algorithm. The choice of the Metropolis MC algorithm is an efficient 
choice to satisfy the detailed balance condition (|A1|) , but it is not the only choice for MC algorithm. Other choices for 
algorithms can give different pause times at pause points which result in different peak areas in pause point spectra. 

The discrepancy of unzipping timescales between experiment and theory is also reflected in the fact that multiple 
values of / needed to be used to obtain the theoretical pause point information. At low / values, theoretical simulations 
could not pass certain barriers of the FEL, as is seen in the abrupt cutoff of peaks in Figure |Hl[a-c), which necessitated 
further tilting of the landscape. The technique of increasing the force is also used in experiment to probe farther out 
regions of the unzipping landscape (Figurc|3J). However, the range of reduced forces used in the theoretical sim ulations 
corresponds roughly to 15-17pN, while the experimental range is roughly 15-30pN IJDanilowicz et alil2003bl) . 

We might expect discrepancies between experimental and theoretical pause point locations to be due to large A- 
rich regions in the genome, since these are more susceptible to bubble formation due to the weaker base pairing and 
stacking interactions (Table P). Thus further investigation into these pause point discrepancies can lead to interesting 
genomic informatio n. Experiments involving higher temperatures and different ionic conditions will help elucidate 
these discrepancies ()Danilowicz et al.U200 3bl. 

This procedure to investigate pause points in DNA unzipping is easily extended to the study of other genomes since 
all that is required is the base sequence and temperature of interest. As an example, we theoretically investigated the 
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FIG. 10 Free energy landscape for the BP-tpXUA genome at / = -0.15, T = 298K (25°C) corresponding to F ^ 16pN. A 
closer view is given in the inset of the approximately horizontal plateau region. 
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FIG. 11 Sample simulation trajectories displayed below the relevant segment of the BP-(/)X174 FEL at / = —0.25 (25°C). The 
three trajectories start at the left of the figure at m, = 600bp, i = 0. Simulation time has units r = 5385 steps corresponding 
to the genome length. Pause points are denoted by arrows on the FEL. Note that very intricate multi-state behavior is seen 
in the walker trajectories. In particular, the region from 725 — 775 bp shows the presence of several minima of the same depth 
on the FEL, and shows up as oscillations in the trajectories. 



pause point unzipping spectrum for the microvirus Bacteriophage Phi-X174 (BP-(/)X174) {M = 5386) ^. The FEL for 
BP-(/)X174 has barriers that are on the order of vM, and is a good example of a landscape which can be approximated 
by an integrated random walk fFigurc [Tn|) . Figure [TTl plots several sample simulation trajectories alongside a segment 
of the BP-0X174 FEL for / ~ —0.25, and figure [T^ plots pause point spectra for several values of /, all at T = 298A' 
(25°C). Once again we see that for low values of /, the spectra grow into peaks at higher base pair as the value of / 
is increased. A value of / = —0.45 is large enough to cause unzipping, and we can see that the spectrum at this value 
has contributions from the whole surface. For BP-0X174, / = -0.45 corresponds to F « 17pN at 298A' (25°C). 



The BP-(/)X174 genome can be found at 'http://www.ncbi.mh.gov/lwith sequence accession number NC_001422. 
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FIG. 12 Theoretical pause point spectra for BP-?iX174 at T = 2987^ (25°C): (a) / = -0.15, (b) / = -0.20, (c) / = -0.25, 
(d) / = —0.45. Each spectrum was created using 300 traces of 10^ steps each. A value of / = —0.45 corresponds to F ~ 17pN. 



VI. CONCLUSION 



We have presented experimental evidence that the dynamics of DNA constant force unzipping are not smooth, but 
rather display characteristic pauses and jumps. Furthermore, we have given strong evidence that the locations of 
these pauses arc primarily governed by the DNA sequence at room temperature (25°C). We have also presented a 
general scheme for computing pause point spectra for any DNA sequence, with the only inputs being the sequence, 
temperature, and a set of ten empirical parameters representing DNA duplex stability. 

The ideas presented above can be applied to any system in which the concept of a pause point can be well defined, or 
in which a 'spectrum' representation of trajectory data can be useful in other ways. We can then enumerate the steps 
involved in constructing a theoretical representation of the system in order to facilitate comparison with experiments: 

1. Using chemical intuition, reduce the system to one degree of freedom. Equilibrium statistical mechanics can be 
used to justify, or derive, the resulting FEL description of the system. 

2. To model experiments, use Monte Carlo simulation with the appropriate algorithm to create theoretical trajec- 
tories. 

3. Compute trajectory spectra using the above procedure for both experimental and theoretical trajectories. 

Such systems, of which DNA constant force unzipping is one, also include topics of current interest such as the motion 
of molecular motors on biopolymers (iDavenport et. all 120001: iKeller and Bustamantel l2000t iNeuman et. al.L l2003t 
iPerkins et. all l2nn.'4 IWang et. allll9m 
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APPENDIX A: Monte Carlo Algorithm 

The Monte Carlo technique is designed to sample an ergodic system according to the equilibrium distribution for 
long simulation times. The distribution is specified by the detailed balance condition 

-Wm^m + l ^ g-(£:(rn+l)-£(m))/fcBT^ ^^^-j 

where Wm^m+i is the rate of taking the step from tti to m + 1 base pair s unzipped. The ratio on the r ight hand side of 
(|Aip insures relaxation to the Boltzmann distribution for long times (|Newman and Barkemai Il999|) . Specifying the 
distribution, and thus the detailed balance criterion, still offers a large degree of flexibility in choosing an algorithm. 
Our goal in this study is to be able to predict the pause points of the DNA unzipping process, and to this end, we 
expect many choices of Monte Carlo algorithm to give equivalent pause points. The s implest algorithm to achieve the 
detailed balance is known as the Metropolis Criterion (JNewman and Barkemai [1999^) . For an unzipping fork location 
at 771, 

1. Choose a direction to move {m + 5,6 = ±1). 

2. If £{m + 5) - £{m) < 0, accept the move and GOTO 1. 

3. li £{m + 5) — S{m) > 0, accept the move with the probability according to the Boltzmann distribution 

In order to prevent the random walkers from trying to unzip (rezip) beyond the end (beginning) of the dsDNA 
strand, we artificially inserted infinite barriers to these transitions in the simulations. 
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